The migration subsystem is composed of the following modules:
- DDDDeeeetttteeeeccccttttiiiioooonnnn MMMMoooodddduuuulllleeee.... This module monitors memory accesses issued by nodes
in the system to each physical memory page. In Origin systems this
module is mostly implemented in hardware. This detection module informs
the _M_i_g_r_a_t_i_o_n _C_o_n_t_r_o_l _M_o_d_u_l_e that a page is experiencing excessive
remote accesses via an interrupt sent to the page's home node.
- MMMMiiiiggggrrrraaaattttiiiioooonnnn EEEEnnnnggggiiiinnnneeee MMMMoooodddduuuulllleeee.... This module carries out data movement from a
current physical memory page to a new page in the node issuing the
remote accesses.
- MMMMiiiiggggrrrraaaattttiiiioooonnnn CCCCoooonnnnttttrrrroooollll MMMMoooodddduuuulllleeee.... This module decides whether the page should
be migrated or not, based on migration control policies, defined by
parameters such as _m_i_g_r_a_t_i_o_n _t_h_r_e_s_h_o_l_d, _b_o_u_n_c_e _d_e_t_e_c_t_i_o_n _a_n_d
_p_r_e_v_e_n_t_i_o_n, _d_a_m_p_e_n_i_n_g _f_a_c_t_o_r, and others.
- MMMMiiiiggggrrrraaaattttiiiioooonnnn CCCCoooonnnnttttrrrroooollll PPPPeeeerrrriiiiooooddddiiiicccc OOOOppppeeeerrrraaaattttiiiioooonnnnssss MMMMoooodddduuuulllleeee.... This module executes all
periodic operations needed for the _M_i_g_r_a_t_i_o_n _C_o_n_t_r_o_l _M_o_d_u_l_e.
The basic goal of memory migration is to minimize memory access latency.
In a NUMA system where local memory access latency is smaller then remote
memory access latency, we can achieve this latency minimization goal by
moving the data to the node where most memory references are going to be
issued from.
It would be great to be able to move data to the node where it is going
to be needed right before it is referenced. Unfortunately, we cannot
predict the future. However, common programs usually have some amount of
temporal and spatial locality, which allows us to heuristically predict
future behavior based on recent past behavior.
The usual procedure used to predict future memory accesses to a page is
to count the memory references to this page issued by each node in the
system. If the accumulated number of remote references becomes
considerably greater than the number of accumulated local references,
then it may be beneficial to migrate the page to the remote node issuing
the references, especially if this remote node will continue accessing
this same page for a long time.
Origin systems have counters that continuously monitor all memory
accesses issued by each node in the system to each physical memory page.
In a 64-node Origin (128 processors), we have 64 memory access counters
for every 4-KB low level physical page (4 KB is the size of a low level
physical page size; software page sizes start at 16KB for Origin
systems). For every memory access, the counter associated with the node
issuing the reference is incremented; at the same time, this counter is
compared to the counter that keeps track of local accesses, and if the
remote counter exceeds the local counter by a threshold, an interrupt is
generated advising the Operating System about the existence of a page
with excessive remote accesses.
Upon reception of the interrupt, the _M_i_g_r_a_t_i_o_n _C_o_n_t_r_o_l _M_o_d_u_l_e in the
Operating System decides whether to migrate the page or not.
The threshold that determines how large the difference between remote and
local counters needs to be in order for the interrupt to be generated is
stored in a per-node hardware register, which is initialized by the
Migration Control Module. The _d_e_f_a_u_l_t _s_y_s_t_e_m _t_h_r_e_s_h_o_l_d defined in
/var/sysgen/mtune/numa by the tunable variables
nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____ddddeeeeffffaaaauuuulllltttt____tttthhhhrrrreeeesssshhhhoooolllldddd and nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____tttthhhhrrrreeeesssshhhhoooolllldddd____rrrreeeeffffeeeerrrreeeennnncccceeee (see
Migration Tunables below), and the threshold specified by users as a
parameter of a migration policy (mmci(5)), are not directly stored into
this register due to the fact that different pages on the same node may
have different migration thresholds. These thresholds are used to
initialize the reference counters when a page is initialized.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____ddddeeeeffffaaaauuuulllltttt____tttthhhhrrrreeeesssshhhhoooolllldddd. This threshold defines the minimum
difference between the local and any remote counter needed to
generate a migration request interrupt.
if ((remote_counter - local_counter) >=
((numa_migr_threshold_reference_value / 100) *
numa_migr_default_threshold)) {
send_migration_request_intr();
}
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____tttthhhhrrrreeeesssshhhhoooolllldddd____rrrreeeeffffeeeerrrreeeennnncccceeee. This parameter defines the pegging
value for the memory reference counters. It is machine
configuration dependent. For Origin 2000 systems, it can take the
following values:
0: MIGR_THRESHREF_STANDARD = Threshold reference is 2048 (11 bit
counters) Maximum threshold allowed
for systems with STANDARD DIMMS. This
is the default.
1: MIGR_THRESHREF_PREMIUM = Threshold reference is 524288 (19-bit
counters) Maximum threshold allowed
for systems with *all* PREMIUM SIMMS.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____vvvveeeehhhhiiiicccclllleeee. This tunable defines what device the system
should use to migrate a page. The value 0 selects the Block
Transfer Engine (BTE) and a value of 1 selects the processor. When
the BTE is selected, and the system is equipped with the optional
_p_o_i_s_o_n _b_i_t_s, the system automatically uses _L_a_z_y _T_L_B _S_h_o_o_t_d_o_w_n
_A_l_g_o_r_i_t_h_m_s.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____mmmmiiiinnnn____mmmmaaaaxxxxrrrraaaaddddiiiiuuuussss. This tunable is used if
nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____ddddeeeeffffaaaauuuulllltttt____mmmmooooddddeeee has been set to mode 4
(MIGR_DEFMODE_LIMITED). For this mode, migration is normally off for
machine configurations with a maximum Craylink distance less than
nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____mmmmiiiinnnn____mmmmaaaaxxxxrrrraaaaddddiiiiuuuussss Migration is normally on otherwise.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____aaaauuuuttttoooo____mmmmiiiiggggrrrr____mmmmeeeecccchhhh. This tunable defines the migration
execution mode for memory reference counter triggered migrations: 0
for immediate and 1 for delayed. Only the _I_m_m_e_d_i_a_t_e _M_o_d_e (0) is
currently available.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____uuuusssseeeerrrr____mmmmiiiiggggrrrr____mmmmeeeecccchhhh. This tunables defines the migration
execution mode for user requested migrations: 0 for immediate and 1
for delayed. Only the _I_m_m_e_d_i_a_t_e _M_o_d_e (0) is currently available.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____ccccooooaaaallllddddmmmmiiiiggggrrrr____mmmmeeeecccchhhh . This tunables defines the migration
execution mode for memory coalescing migrations: 0 for immediate and
1 for delayed. Only the _I_m_m_e_d_i_a_t_e _M_o_d_e (0) is currently available.
* nnnnuuuummmmaaaa____rrrreeeeffffccccnnnntttt____ddddeeeeffffaaaauuuulllltttt____mmmmooooddddeeee. Extended counters are used in application
profiling (see rrrreeeeffffccccnnnntttt((((5555))))) and to control automatic memory migration.
This tunable defines the default extended reference counter mode. It
can take the following values:
0: REFCNT_DEFMODE_DISABLED
Extended reference counters are disabled, users cannot access
the extended reference counters (refcnt(5)). In this case
automatic memory migration will not be performed regardless of
any other settings.
1: REFCNT_DEFMODE_ENABLED
Extended reference counters are always enabled, users cannot
disable them.
2: REFCNT_DEFMODE_NORMOFF
Extended reference counters are normally disabled, users can
disable or enable the counters for an application.
3: REFCNT_DEFMODE_NORMON
Extended reference counters are normally enabled, users can
disable or enable the counters for an application.
* nnnnuuuummmmaaaa____rrrreeeeffffccccnnnntttt____oooovvvveeeerrrrfffflllloooowwww____tttthhhhrrrreeeesssshhhhoooolllldddd This tunable defines the count at
which the hardware reference counters notify the operating system of
a counter overflow in order for the count to be transferred into the
(software) extended reference counters. It is expresses as a
percentage of the threshold reference value defined by
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____mmmmiiiinnnn____ddddiiiissssttttaaaannnncccceeee Minimum distance required by the _N_o_d_e
_D_i_s_t_a_n_c_e _F_i_l_t_e_r in order to accept a migration request.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____mmmmeeeemmmmoooorrrryyyy____lllloooowwww____eeeennnnaaaabbbblllleeeedddd Enable or disable the _M_e_m_o_r_y _P_r_e_s_s_u_r_e
_F_i_l_t_e_r.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____mmmmeeeemmmmoooorrrryyyy____lllloooowwww____tttthhhhrrrreeeesssshhhhoooolllldddd Threshold at which the _M_e_m_o_r_y
_P_r_e_s_s_u_r_e _F_i_l_t_e_r starts rejecting migration requests to a node. This
threshold is expressed as a percentage of the total amount of
physical memory in a node.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____ffffrrrreeeeeeeezzzzeeee____eeeennnnaaaabbbblllleeeedddd Enable or disable the freezing operation in
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____ffffrrrreeeeeeeezzzzeeee____tttthhhhrrrreeeesssshhhhoooolllldddd Threshold at which a page is frozen. This
tunable is expressed as a percent of the maximum count supported by
the migration counters (7 for Origin 2000).
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____mmmmeeeelllltttt____eeeennnnaaaabbbblllleeeedddd Enable or disable the melting operation in
the _B_o_u_n_c_e _C_o_n_t_r_o_l _F_i_l_t_e_r.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____mmmmeeeelllltttt____tttthhhhrrrreeeesssshhhhoooolllldddd When a migration counter goes below this
threshold a page is unfrozen. This tunable is expressed as a
percent of the maximum count supported by the migration counters (7
for Origin 2000).
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____bbbboooouuuunnnncccceeee____ccccoooonnnnttttrrrroooollll____iiiinnnntttteeeerrrrvvvvaaaallll This tunable defines the period
for the loop that ages the migration counters and the dampening
counters. It is expressed in terms of _n_u_m_b_e_r _o_f _m_e_m__t_i_c_k_s. The
mem_tick unit is defined by mmmmeeeemmmm____ttttiiiicccckkkk____bbbbaaaasssseeee____ppppeeeerrrriiiioooodddd _b_e_l_o_w. _I_f _i_t _i_s
* mmmmeeeemmmm____ttttiiiicccckkkk____bbbbaaaasssseeee____ppppeeeerrrriiiioooodddd Number of 10[ms] system ticks in one mem_tick.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____uuuunnnnppppeeeeggggggggiiiinnnngggg____ccccoooonnnnttttrrrroooollll____eeeennnnaaaabbbblllleeeedddd Enable or disable the unpegging
periodic operation
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____uuuunnnnppppeeeeggggggggiiiinnnngggg____ccccoooonnnnttttrrrroooollll____iiiinnnntttteeeerrrrvvvvaaaallll This tunable defines the period
for the loop that unpegs the hardware memory reference counters. It
is expressed in terms of _n_u_m_b_e_r _o_f _m_e_m__t_i_c_k_s. The mem_tick unit is
defined by mmmmeeeemmmm____ttttiiiicccckkkk____bbbbaaaasssseeee____ppppeeeerrrriiiioooodddd _a_b_o_v_e. _I_f _i_t _i_s _s_e_t _t_o _0, _w_e _p_r_o_c_e_s_s
counter value at which we consider the counter to be pegged. It is
expressed as a percent of the maximum count defined by
numa_migr_threshold_reference.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____ttttrrrraaaaffffffffiiiicccc____ccccoooonnnnttttrrrroooollll____eeeennnnaaaabbbblllleeeedddd Enable or disable the _T_r_a_f_f_i_c
_C_o_n_t_r_o_l _F_i_l_t_e_r. This is an experimental module, and therefore it
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____ttttrrrraaaaffffffffiiiicccc____ccccoooonnnnttttrrrroooollll____iiiinnnntttteeeerrrrvvvvaaaallll Traffic control period.
Experimental module.
* nnnnuuuummmmaaaa____mmmmiiiiggggrrrr____ttttrrrraaaaffffffffiiiicccc____ccccoooonnnnttttrrrroooollll____tttthhhhrrrreeeesssshhhhoooolllldddd Traffic control threshold for
kicking the batch migration of enqueued migration requests.